21 research outputs found

    Design of 4-Bit 4-Tap FIR Filter Based on Quantum-Dot Cellular Automata (QCA) Technology with a Realistic Clocking Scheme

    Get PDF
    The increasing demand for efficient signal processors necessitates the design of digital finite duration impulse response FIR filter which occupies less area and consumes less power. FIR filters have simple, regular and scalable structures. This paper represents designing and implementation of a low-power 4-tap FIR filter based on quantum-dot cellular automata (QCA) by using a realistic clocking scheme. The QCADesigner software, as widely used in QCA circuit design and verification, has been used to implement and to verify all of the designs in this study. Power dissipation result has been computed for the proposed circuit using accurate QCADesigner-E software. The proposed QCA FIR achieves about 97.74% reduction in power compared to previous existing designs. The outcome of this work can clearly open up a new window of opportunity for low-power signal processing system

    Modules placement technique under constraint of FPGA forbidden zones

    No full text

    Temporal partitioning of data flow graphs for reconfigurable architectures

    No full text

    Toward the Implementation of an ASIC-Like System on FPGA for Real-Time Video Processing with Power Reduction

    No full text
    Driven by the importance of energy consumption in system-on-chip design as an evaluation factor, this paper presents a design methodology at the system level to optimize power consumption on ARM-based architecture for real-time video processing. The proposed design flow is based on the interaction between the tool and user optimizations. The tool optimizations are the options and best practices available on the integrated design environment for the Xilinx technology and the target Zynq-7000 architecture. The user methods present methods proposed by the user to optimize power consumption. We used the principles of voltage scaling and frequency scaling techniques for user methods. These two techniques allow energy to be consumed in the proportion of work to be done. The suggested flow is applied on real-time video processing system. The results show power savings for up to 60% with respect to performance and real-time constraints

    High-level optimised systems design using hardware-software partitioning

    No full text

    Efficient GPU Implementation of Lucas-Kanade through OpenACC

    No full text
    International audienceOptical flow estimation stands as an essential component for motion detection and object tracking procedures. It is an image processing algorithm, which is typically composed of a series of convolution masks (approximation of the derivatives) followed by 2 × 2 linear systems for the optical flow vectors. Since we are dealing with a stencil computation for each stage of the algorithm, the overhead from memory accesses is expected to be significant and to yield a genuine scalability bottleneck, especially with the complexity of GPU memory configuration. In this paper, we investigate a GPU deployment of an optimized CPU implementation via OpenACC, a directive-based parallel programming model and framework that ease the process of porting codes to a wide-variety of heterogeneous HPC hardware platforms and architectures. We explore each of the major technical features and strive to get the best performance impact. Experimental results on a Quadro P5000 are provided together with the corres ponding technical discussions, taking the performance of the multicore version on a INTEL Broadwell EP as the baseli

    An Ultra-Low Power Parity Generator Circuit Based on QCA Technology

    No full text
    Quantum-dot cellular automata (QCA) technology is one of the emerging technologies that can be used for replacing CMOS technology. It has attracted significant attention in the recent years due to its extremely low power dissipation, high operating frequency, and a small size. In this study, we demonstrate an n-bit parity generator circuit by utilizing QCA technology. Here, a novel XOR gate is used in the synthesis of the proposed circuit. The proposed gate is based on electrostatic interactions between cells to perform the desired function. The comparison results demonstrate that the designed QCA circuits have advantages compared to other circuits in terms of cell count, area, delay, and power consumption. The QCADesigner software, as widely used QCA circuit design and verification, has been used to implement and to verify all of the designs in this study. Power dissipation has been computed for the proposed circuit using accurate QCAPro power estimator tool

    Evaluation of an OpenMP Parallelization of Lucas-Kanade on a NUMA-Manycore

    No full text
    International audienceLucas-Kanade algorithm is a well-known optical flow estimator widely used in image processing for motion detection and object tracking. As a typical image processing algorithm, the procedure is a series of convolution masks followed by 22 linear systems for the optical flow vectors. Since we are dealing with a stencil computation for each stage of thealgorithm, the overhead from memory accesses is expected to stand as a serious scalability bottleneck, especially on a NUMA manycore configuration. The objective of this study is therefore to investigate an OpenMP parallelization of Lucas-kanade algorithm on a NUMA manycore, including the performance impact of NUMA-aware settings at runtime. Experimental results on a dual-socket INTEL Broadwell-E/EP is provided together with the corresponding technical discussion
    corecore